分布式培训已成为培训大型神经网络(NN)模型的普遍性和有效的方法,该模型加工大规模数据。然而,满足来自各种NN模型,多样化计算资源的要求以及在培训工作期间的动态变化是非常挑战的。在这项研究中,我们在系统的端到端视图中设计了我们的分布式训练框架,以提供不同场景的内置自适应能力,特别是对于工业应用和生产环境,通过完全考虑资源分配,模型分区,任务放置和分布式执行。基于统一的分布式图和统一群集对象,我们的自适应框架配备了全球成本模型和全局计划者,可以实现任意并行,资源感知的放置,多模式执行,容错和弹性分布式。训练。实验表明,我们的框架可以满足应用程序的多样性和资源的异质性满足各种要求和具有竞争力的性能。具有260亿参数的Ernie语言模型在数千个AI处理器上有效地培训,可扩展性较弱的91.7%。通过采用异质管道异步执行,从推荐系统的模型的吞吐量可以分别增加到2.1倍,仅增加了GPU和CPU培训的3.3倍。此外,容错和弹性分布式培训已成功应用于在线工业应用,这减少了长期培训工作的数量,增加了34.49%,并在全球调度效率增加了33.91%生产环境。
translated by 谷歌翻译
Finding the mixed Nash equilibria (MNE) of a two-player zero sum continuous game is an important and challenging problem in machine learning. A canonical algorithm to finding the MNE is the noisy gradient descent ascent method which in the infinite particle limit gives rise to the {\em Mean-Field Gradient Descent Ascent} (GDA) dynamics on the space of probability measures. In this paper, we first study the convergence of a two-scale Mean-Field GDA dynamics for finding the MNE of the entropy-regularized objective. More precisely we show that for any fixed positive temperature (or regularization parameter), the two-scale Mean-Field GDA with a {\em finite} scale ratio converges to exponentially to the unique MNE without assuming the convexity or concavity of the interaction potential. The key ingredient of our proof lies in the construction of new Lyapunov functions that dissipate exponentially along the Mean-Field GDA. We further study the simulated annealing of the Mean-Field GDA dynamics. We show that with a temperature schedule that decays logarithmically in time the annealed Mean-Field GDA converges to the MNE of the original unregularized objective function.
translated by 谷歌翻译
Deep operator network (DeepONet) has demonstrated great success in various learning tasks, including learning solution operators of partial differential equations. In particular, it provides an efficient approach to predict the evolution equations in a finite time horizon. Nevertheless, the vanilla DeepONet suffers from the issue of stability degradation in the long-time prediction. This paper proposes a {\em transfer-learning} aided DeepONet to enhance the stability. Our idea is to use transfer learning to sequentially update the DeepONets as the surrogates for propagators learned in different time frames. The evolving DeepONets can better track the varying complexities of the evolution equations, while only need to be updated by efficient training of a tiny fraction of the operator networks. Through systematic experiments, we show that the proposed method not only improves the long-time accuracy of DeepONet while maintaining similar computational cost but also substantially reduces the sample size of the training set.
translated by 谷歌翻译
Recent works have impressively demonstrated that there exists a subnetwork in randomly initialized convolutional neural networks (CNNs) that can match the performance of the fully trained dense networks at initialization, without any optimization of the weights of the network (i.e., untrained networks). However, the presence of such untrained subnetworks in graph neural networks (GNNs) still remains mysterious. In this paper we carry out the first-of-its-kind exploration of discovering matching untrained GNNs. With sparsity as the core tool, we can find \textit{untrained sparse subnetworks} at the initialization, that can match the performance of \textit{fully trained dense} GNNs. Besides this already encouraging finding of comparable performance, we show that the found untrained subnetworks can substantially mitigate the GNN over-smoothing problem, hence becoming a powerful tool to enable deeper GNNs without bells and whistles. We also observe that such sparse untrained subnetworks have appealing performance in out-of-distribution detection and robustness of input perturbations. We evaluate our method across widely-used GNN architectures on various popular datasets including the Open Graph Benchmark (OGB).
translated by 谷歌翻译
The diverse demands of different summarization tasks and their high annotation costs are driving a need for few-shot summarization. However, despite the emergence of many summarization tasks and datasets, the current training paradigm for few-shot summarization systems ignores potentially shareable knowledge in heterogeneous datasets. To this end, we propose \textsc{UniSumm}, a unified few-shot summarization model pre-trained with multiple summarization tasks and can be prefix-tuned to excel at any few-shot summarization datasets. Meanwhile, to better evaluate few-shot summarization systems, under the principles of diversity and robustness, we assemble and publicize a new benchmark \textsc{SummZoo}. It consists of $8$ diverse summarization tasks with multiple sets of few-shot samples for each task, covering both monologue and dialogue domains. Experimental results and ablation studies show that \textsc{UniSumm} outperforms strong baseline systems by a large margin across all tasks in \textsc{SummZoo} under both automatic and human evaluations. We release our code and benchmark at \url{https://github.com/microsoft/UniSumm}.
translated by 谷歌翻译
Controllable summarization allows users to generate customized summaries with specified attributes. However, due to the lack of designated annotations of controlled summaries, existing works have to craft pseudo datasets by adapting generic summarization benchmarks. Furthermore, most research focuses on controlling single attributes individually (e.g., a short summary or a highly abstractive summary) rather than controlling a mix of attributes together (e.g., a short and highly abstractive summary). In this paper, we propose MACSum, the first human-annotated summarization dataset for controlling mixed attributes. It contains source texts from two domains, news articles and dialogues, with human-annotated summaries controlled by five designed attributes (Length, Extractiveness, Specificity, Topic, and Speaker). We propose two simple and effective parameter-efficient approaches for the new task of mixed controllable summarization based on hard prompt tuning and soft prefix tuning. Results and analysis demonstrate that hard prompt models yield the best performance on all metrics and human evaluations. However, mixed-attribute control is still challenging for summarization tasks. Our dataset and code are available at https://github.com/psunlpgroup/MACSum.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
张张量强大的主成分分析(TRPCA)旨在恢复因稀疏噪声破坏的低排名张量,在许多真实应用中引起了很多关注。本文开发了一种新的全球加权TRPCA方法(GWTRPCA),该方法是第一种同时考虑额外域内切片和额叶间切片奇异值的重要性。利用这些全球信息,GWTRPCA惩罚了较大的单数值,并为其分配了较小的权重。因此,我们的方法可以更准确地恢复低管级组件。此外,我们提出了通过改良的考奇估计量(MCE)的有效自适应学习策略,因为重量设置在GWTRPCA的成功中起着至关重要的作用。为了实现GWTRPCA方法,我们使用乘数的交替方向方法(ADMM)方法设计了一种优化算法。对现实世界数据集的实验验证了我们提出的方法的有效性。
translated by 谷歌翻译
在纠缠和连贯性等计量学中利用量子效应使人们可以测量具有增强灵敏度的参数。但是,时间依赖性的噪声会破坏这种海森堡限制的扩增。我们提出了一种基于量子信号处理框架,以克服这些现实的噪声诱导的实践量子计量学限制。我们的算法将门参数$ \ varphi $〜(单量Z阶段)分开,该算法易受时间依赖性错误与目标门参数$ \ theta $〜(| 10>和| 01> state之间的交换 - 角)易受时间依赖时间的错误。这在很大程度上没有时间依赖性误差。我们的方法实现了$ 10^{ - 4} $径向的准确性,用于学习超导级实验的$ \ theta $,以优于两个数量级的现有替代方案。我们还通过快速的傅立叶变换和顺序相位差异证明了学习时间依赖性栅极参数的鲁棒性。我们从理论和数字上均显示出最佳计量方差缩放的有趣过渡,这是电路深度$ d $的函数,从预抗态度制度$ d \ ll 1/\ theta $ to to Heisenberg限制$ d \ to \ to \ $ $。值得注意的是,在临时策略中,我们的方法对时间敏感参数$ \ varphi $比例的估计差异比渐近的海森伯格限制快速限制为深度的函数,$ \ text {var}(\ hat {\ varphi})\ aid 1/d^4 $。我们的工作是第一个证明在实验室量子计算机中实用应用的量子信号处理算法。
translated by 谷歌翻译
轨迹预测对于自动驾驶汽车(AV)是必不可少的,以计划正确且安全的驾驶行为。尽管许多先前的作品旨在达到更高的预测准确性,但很少有人研究其方法的对抗性鲁棒性。为了弥合这一差距,我们建议研究数据驱动的轨迹预测系统的对抗性鲁棒性。我们设计了一个基于优化的对抗攻击框架,该框架利用精心设计的可区分动态模型来生成逼真的对抗轨迹。从经验上讲,我们基于最先进的预测模型的对抗性鲁棒性,并表明我们的攻击使通用指标和计划感知指标的预测错误增加了50%以上和37%。我们还表明,我们的攻击可以导致AV在模拟中驶离道路或碰撞到其他车辆中。最后,我们演示了如何使用对抗训练计划来减轻对抗性攻击。
translated by 谷歌翻译